Synth training - Migratory policy and violence in Chile

Setup

#install.packages("Synth")
#install.packages("ggplot2")
#install.packages("skimr")
#install.packages("dplyr")
#install.packages("knitr")
library(Synth)
##
## Synth Package: Implements Synthetic Control Methods.
## See https://web.stanford.edu/~jhain/synthpage.html for additional information.
library(skimr)
library(ggplot2)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(htmltools)
library(knitr)

Motivation

There is a strong sense of insecurity in Chile

Which is backed up by the late figures on violence

Often, insecurity is attributed to migration without much evidence, leading to political repercusions

Local politicians target the 2019 policy to ease entry of Venezuelan migrants

Data exploration

southamerica <- read.csv("southamerica.csv")

Variable description

  • country: country label for each observation.

  • country_id: a numeric label for each country.

  • year year of observation.

  • hdi: human development index in year t.

  • gini: gini coefficient in year t.

  • hom: homicides per 1000,000 people in year t.

  • ppincppp: GDP per capita, PPP (current international $) in year t.

  • pov: poverty headcount ratio at $6.85 a day (2017 PPP) (% of population) in year t.

  • homale: homicides of males per 1000,000 people in year t.

  • linc: natural logarithm of variable ‘ppincppp’.

  • lhom: natural logarithm of variable ‘hom’.

  • lhomale: natural logarithm of variable ‘homale’

Summary statistics

skim(southamerica)
Data summary
Name southamerica
Number of rows 384
Number of columns 12
_______________________
Column type frequency:
character 1
numeric 11
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
country 0 1 4 9 0 12 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1 2005.50 9.25 1990.00 1997.75 2005.50 2013.25 2021.00 ▇▇▇▇▇
hdi 0 1 0.71 0.07 0.51 0.66 0.71 0.76 0.86 ▁▃▇▇▃
gini 0 1 50.31 5.47 39.50 45.23 50.12 54.72 61.60 ▃▇▅▆▅
hom 0 1 16.59 15.26 0.97 6.18 12.16 20.84 85.43 ▇▂▁▁▁
ppincppp 0 1 10908.37 5398.27 2294.73 6828.28 10114.63 14038.47 28337.07 ▇▇▃▂▁
pov 0 1 35.14 25.61 -216.05 22.46 39.12 48.16 94.63 ▁▁▁▅▇
homale 0 1 29.42 28.90 3.76 10.21 19.52 38.37 159.00 ▇▂▁▁▁
country_id 0 1 6.50 3.46 1.00 3.75 6.50 9.25 12.00 ▇▅▅▅▇
linc 0 1 9.17 0.52 7.74 8.83 9.22 9.55 10.25 ▁▃▇▇▃
lhom 0 1 2.47 0.82 -0.03 1.82 2.50 3.04 4.45 ▁▅▇▇▂
lhomale 0 1 3.00 0.87 1.32 2.32 2.97 3.65 5.07 ▅▇▇▅▂

Chile show an increase in homicide in recent years

chl_flt <- southamerica %>% filter(country == "Chile")
ggplot(chl_flt, aes(x = year, y = hom, group = country, color = country)) +
  geom_line() +
  labs(title = "Homicides per 100,000 in Chile (1990-2021)",
       x = "Year",
       y = "Ln Homicides per 100,000 people") +
  theme_minimal()

ggplot(chl_flt, aes(x = year, y = homale, group = country, color = country)) +
  geom_line() +
  labs(title = "Homicides of males per 100,000 in Chile (1990-2021)",
       x = "Year",
       y = "Ln Homicides of males per 100,000 people") +
  theme_minimal()

Workflow

  • Data collection from UN and World Bank datasets.

  • Imputation of missing observations by country using MICE:

    • Method explanation here.

    • Python notebook with imputation code here.

  • Data wrangling in Stata to create natural logarithm variables.

  • Synthetic control done in R.

Limitations

  • Lack of literature review.

  • Imputation on MICE withouth any restriction. In some cases, some variables have only few observations.

  • Potential collinearity, heteroskedasticity, and endogeneity.

  • Poverty variable left outside the analysis as imputed values extremely high for some countries.

  • Dependent variable accounts only for homicides of males but not for other crimes that could be helpful to analize: theft, drug related, organized crime.

  • Limitation when interpreting Synth results.